Supporting invocations of the rdtsc (read time-stamp counter) instruction by guest code within a secure hardware enclave

ABSTRACT

Techniques for supporting invocations of the RDTSC (Read Time-Stamp Counter) instruction, or equivalents thereof, by guest program code running within a virtual machine (VM), including guest program code running within a secure hardware enclave of the VM, are provided. In one set of embodiments, a hypervisor can activate time virtualization heuristics for the VM, where the time virtualization heuristics cause accelerated delivery of system clock timer interrupts to a guest operating system (OS) of the VM. The hypervisor can further determine a scaling factor to be applied to timestamps generated by one or more physical CPUs, where the timestamps are generated in response to invocations of a CPU instruction made by guest program code running within the VM, and where the scaling factor is based on the activated time virtualization heuristics. The hypervisor can then program the scaling factor into the one or more physical CPUs.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation application and, pursuant to 35U.S.C. § 120, is entitled to and claims the benefit of earlier filedapplication U.S. application Ser. No. 16/822,054 filed Mar. 18, 2020,which claims the benefit under 35 U.S.C. 119(a)-(d) to ForeignApplication Serial No. 202041001650 filed in India entitled “SUPPORTINGINVOCATIONS OF THE RDTSC (READ TIME-STAMP COUNTER) INSTRUCTION BY GUESTCODE WITHIN A SECURE HARDWARE ENCLAVE”, on Jan. 14, 2020, by VMWARE,Inc., the content of all of which is incorporated herein by reference intheir entirety for all purposes.

BACKGROUND

There are several ways in which guest program code running within avirtual machine (VM) can keep track of the VM's system time (i.e., theamount of real-world time that has elapsed from the point of powering-onthe VM). One method is to employ a programmable interval timer, known asa system clock timer, that causes interrupts to be generated on aperiodic basis which are delivered to the VM's guest operating system(OS). Upon receiving such an interrupt, the guest OS increments acounter value (i.e., system clock counter) that indicates the VM'ssystem time, and this system clock counter can be queried by other guestcode via an appropriate guest OS Application programming interface(API). For example, if the system clock timer is programmed to generatean interrupt every 10 milliseconds (ms) and the system clock counter hasa value of 1000, that means a total of 10×1000=10,000 ms (or 10 seconds)have passed since the time of VM power-on.

Another way in which guest code can track/determine VM system time is toemploy the RDTSC (Read Time-Stamp Counter) instruction that isimplemented by Central processing units (CPUs) based on the x86 CPUarchitecture. When guest code calls the RDTSC instruction, the physicalCPU mapped to the VM's virtual CPU (vCPU) writes a hardware-derivedtimestamp value (referred to herein as an RDTSC timestamp) into two vCPUregisters. This RDTSC timestamp indicates the number of physical CPUclock cycles, or ticks, that have occurred since the time of host systempower-on/reset (subject to an offset specified by the host system'shypervisor to account for the time at which the VM was powered-on).Thus, this timestamp can be considered reflective of the amount ofreal-world time that has transpired during that period. The guest codethat invoked the RDTSC instruction can then retrieve the RDTSC timestampfrom the appropriate vCPU registers and thereby determine the current VMsystem time.

To account for scenarios in which the physical compute resources of ahost system become overcommitted (i.e., scenarios where the number ofrunning vCPUs exceed the number of available physical CPUs), someexisting hypervisors implement a time virtualization heuristics modulethat uses heuristics to intelligently accelerate the delivery of systemclock timer interrupts to VMs which have been de-scheduled andsubsequently re-scheduled on the host system's physical CPU(s) due toCPU starvation/contention. This accelerated interrupt delivery ensuresthat the system time of such VMs (as determined via their system clockcounters) eventually catches up to real-world time.

In addition, hardware virtualization technologies such as Intel VT andAMD-V provide the capability to intercept RDTSC instructions invoked byVMs. When time virtualization heuristcs are active, existing hypervisorsmake use of this capability to (1) trap an RDTSC instruction call madeby VM guest code, (2) emulate execution of the RDTSC instruction (i.e.,generating a RDTSC timestamp in software rather than via the CPUhardware), and (3) provide the software-generated RDTSC timestamp to thecalling guest code. This RDTSC trapping and emulation mechanism allowsthe hypervisor to provide an RDTSC timestamp to the calling guest codethat is consistent with the hypervisor-level time virtualizationheuristics applied to the VM's system clock timer interrupts, and thusensures that the VM as a whole has a coherent view of its system timefrom both clock sources.

However, a significant complication with the foregoing is that the RDTSCinstruction may be invoked by guest code running within a securehardware enclave of the VM. In these cases, the hypervisor cannotprovide an emulated RDTSC timestamp to the calling guest code becausethe state of that guest code is isolated from, and thus cannot bemodified by, the hypervisor. As a result, the hypervisor cannot ensurethat the RDTSC timestamps received by the secure hardware enclave guestcode will be consistent with time virtualization heuristics applied tothe VM's system clock timer interrupts.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a host system in accordance with certain embodiments.

FIG. 2 depicts a first workflow for supporting RDTSC invocations via aTSC scaling approach according to certain embodiments.

FIG. 3 depicts a second workflow for supporting RDTSC invocations viathe TSC scaling approach according to certain embodiments.

FIG. 4 depicts a workflow for supporting RDTSC invocations via aheuristics suppression approach according to certain embodiments.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerousexamples and details are set forth in order to provide an understandingof various embodiments. It will be evident, however, to one skilled inthe art that certain embodiments can be practiced without some of thesedetails, or can be practiced with modifications or equivalents thereof

1. Overview

Embodiments of the present disclosure are directed to techniques thatmay be implemented by a hypervisor of a host system for supportinginvocations of the RDTSC instruction (or equivalents thereof) by guestprogram code running within a secure hardware enclave of a VM. As usedherein, a secure hardware enclave (also known as a hardware-assistedtrusted execution environment or TEE) is a region of computer systemmemory, allocated via a special set of CPU instructions, whereuser-world code can run in a manner that is isolated from otherprocesses running in other memory regions (including those running athigher privilege levels, such as a VM's guest OS or the hypervisor).Thus, secure hardware enclaves enable secure and confidentialcomputation. Examples of existing technologies that facilitate thecreation and use of secure hardware enclaves include SGX (Software GuardExtensions) for Intel CPUs and TrustZone for ARM CPUs.

At a high level, the techniques of the present disclosure guarantee thatinvocations of the RDTSC instruction made by guest code within a VM'ssecure hardware enclave do not result in a conflict between (1) theguest code's understanding of VM system time as determined using theRDTSC instruction, and (2) the guest code's understanding of VM systemtime as determined using the VM's system clock timer/counter, which maybe manipulated by the hypervisor via time virtualization heuristics.These and other aspects are described in further detail in the sectionsthat follow.

2. Example Host System

FIG. 1 is a simplified block diagram of a host system 100 in whichembodiments of the present disclosure may be implemented. As shown, hostsystem 100 includes a software layer comprising a hypervisor 102 and anumber of VMs 104(1)-(J) and a hardware layer comprising a number ofphysical CPUs 106(1)-(K). In various embodiments, hypervisor 102 canvirtualize the hardware resources of host system 100, including physicalCPUs 106(1)-(K), and allocate these virtualized hardware resources toVMs 104(1)-(J) so that the VMs can carry out guest applicationworkloads.

As mentioned previously, there are two different types of timekeepingsystems which guest program code running within VMs 104(1)-(J) can relyon in order to track/determine the system time of their respective VMs.The first type of timekeeping system uses a VM-level system clockcounter which is incremented based on interrupts that are delivered inaccordance with a programmable system clock timer. The second type oftimekeeping system uses the RDTSC CPU instruction which is supported byx86 CPUs (and is assumed to be supported by physical CPUs 106(1)-(K)).

In a virtualized host system like system 100 of FIG. 1 , it is fairlycommon for the host system's physical CPUs to become temporarilyovercommitted. In scenarios where the host system's VMs use the firsttype of time keeping system noted above (i.e., a system based on systemclock timer interrupts), CPU overcommitment problems can prevent systemclock timer interrupts from being delivered to a VM's guest OS in atimely fashion. This, in turn, can cause inconsistencies in the guestOS's understanding of VM system time. To understand this, consider ascenario in which the system clock timer of VM 104(1) is programmed togenerate an interrupt every 10 ms and the vCPU of VM 104(1) is placed(i.e., scheduled) on physical CPU 106(1) for execution. Assume that thevCPU runs for 50 ms on physical CPU 106(1), resulting in the delivery of5 interrupts (one every 10 ms) to the guest OS of VM 104(1) and anincrease in the VM's system clock counter to 5. Further assume thatafter those 50 ms of runtime, the vCPU of VM 104(1) is de-scheduled fromphysical CPU 106(1) and placed in a “waiting” state for the next 30 msbecause the vCPU of another VM needs to run on physical CPU 106(1) (andthere are no other physical CPUs available). After the 30 ms haveelapsed, assume that the vCPU of VM 104(1) is re-scheduled on physicalCPU 106(1) (or another available physical CPU of host system 100) andresumes its execution.

In the foregoing scenario, the guest OS of VM 104(1) cannot receive anyinterrupts during the 30 ms timeframe in which the VM's vCPU isde-scheduled and in the waiting state because the VM's vCPU is notactually running during that period. Thus, the guest OS will receive itssixth interrupt at the real-world time of 50 ms (initial operation)+30ms (waiting state)+10 ms (timer interval)=90 ms. However, because theguest OS has only received 6 interrupts, the guest OS will erroneouslybelieve that the correct VM system time at that point is 60 ms (i.e., 6interrupts×10 ms per interrupt). In addition, as long as the guest OScontinues receiving further interrupts at the programmed interval of 10ms from that point onward, the guest OS's notion of VM system time (perthe system clock counter) will perpetually lag behind real-world time by30 ms.

To address this issue, certain existing hypervisors implement a timevirtualization heuristics module (shown as module 108 in FIG. 1 ) thatis designed to detect these types of CPU overcommitment scenarios andaccelerate the delivery of system clock timer interrupts to the guestOSs of “lagging” VMs like VM 104(1). Through this accelerated interruptdelivery, module 108 can enable a lagging VM's system clock counter toeventually catch up—or in other words, reach consistency—with real-worldtime. Once this catch up is achieved, time virtualization heuristicsmodule 108 can be deactivated for that VM until the module determinesthat the catch-up heuristics are needed again (e.g., the VM missesfurther interrupts due to its vCPU being de-scheduled and re-scheduledin the future).

For example, with respect to VM 104(1) discussed above, at the time theVM's vCPU is re-scheduled on physical CPU 106(1) after 30 ms in thewaiting state, time virtualization heuristics module 108 can detect thatthe VM's guest OS has effectively missed three interrupts (one interruptper 10 ms) after the fifth one and thus apply catch-up heuristics thatcause the next six interrupts (i.e., the sixth, seventh, eighth, ninth,tenth, and eleventh interrupts) to be delivered to the guest OS at anaccelerated pace of every 5 ms, rather than at the programmed pace ofevery 10 ms. The delivery times of interrupts 6-10 at this acceleratedpace is presented in the table below in terms of real-world time and iscontrasted against the delivery times for these interrupts that wouldoccur at the normal programmed pace:

TABLE 1 Time of delivery (in Time of delivery (in real-world time) atreal-world time) at Interrupt accelerated pace of programmed pace ofnumber 5 ms intervals 10 ms intervals Sixth 85 ms 90 ms Seventh 90 ms100 ms Eighth 95 ms 110 ms Ninth 100 ms 120 ms Tenth 105 ms 130 msEleventh 110 ms 140 ms

As can be seen in the second table column above, once the eleventhinterrupt has been delivered to the guest OS of VM 104(1) at theaccelerated pace of 5 ms intervals, the VM's notion of its system timeper the system clock counter (i.e., 11 interrupts×10 ms perinterrupt=110 ms) will have caught up with the real-world time of 110ms. Thus, time virtualization heuristics module 108 has brought the VM'ssystem time in alignment with real-world time, and module 108 can bedeactivated at that point for VM 104(1) (i.e., future interrupts can bedelivered to the VM's guest OS at the normal programmed pace of 10 ms)until needed again. If no catch-up heuristics were applied to VM 104(1),the VM's system time would lag behind real-world time by 30 ms at everyinterrupt point as shown in the third column of Table 1.

One consequence of implementing time virtualization heuristics module108 is that the RDTSC instruction may return timestamp values thatconflict (or in other words, are out of sync) with the system clockcounter of a VM when module 108 is enabled/active with respect to thatVM. This is problematic because some guest OSs leverage the RDTSCinstruction in combination with the system clock counter for higherresolution system time readings. For example, because the frequency atwhich system clock timer interrupts are generated is relatively low(e.g., every 10 ms), a guest OS can determine a more accurate timereading by calculating (time from system clock counter)+RDTSC timeelapsed since the last system clock timer interrupt. Thus, it isdesirable to ensure that the VM's system clock counter remains in syncwith the RDTSC timestamps returned by the RDTSC instruction when timevirtualization heuristics are active.

The foregoing is currently achieved via a hypervisor-level RDTSC traphandler and corresponding RDTSC emulator (shown via reference numerals110 and 112 in FIG. 1 ). Generally speaking, when time virtualizationheuristics module 108 enables catch-up heuristics for a given VM 104,module 108 also enables a trap bit that causes all calls to the RDTSCinstruction by guest code within the VM to be trapped by (or in otherwords, cause an exit to) hypervisor 102. This is in contrast to thenormal sequence of operation in which a guest call to the RDTSCinstruction causes the instruction to be directly passed to and handledby a physical CPU 106, without involvement by hypervisor 102.

When the trap/exit to hypervisor 102 occurs, the hypervisor 102 passescontrol to RDTSC trap handler 110, which in turn uses RDTSC emulator 112to emulate the invoked RDTSC instruction in software and therebygenerate a RDTSC timestamp that takes into account the current state oftime virtualization heuristics module 108 with respect to the subjectVM. RDTSC trap handler 110 then writes the emulated RDTSC timestamp toone or more vCPU registers of the VM and control returns to the callingguest code, which retrieves the emulated RDTSC timestamp from the vCPUregister(s) and continues with its execution. Finally, once the catch-upheuristics applied via time virtualization heuristics module 108 haveenabled the VM's system clock counter to catch up with real-world time,module 108 unsets the trap bit. This causes further RDTSC instructioncalls from within the VM to be once again handled directly by a physicalCPU 106 in accordance with the normal mode of operation.

For instance, in the previously discussed example of VM 104(1), assumeguest code within the VM calls the RDTSC instruction at the time ofdelivery of the seventh interrupt (i.e., at the real-world time of 90ms). If the RDTSC instruction were passed directly through to a physicalCPU 106, the physical CPU would return a hardware derived RDTSCtimestamp reflecting the real-world time of 90 ms. However, that wouldbe undesirable because the guest OS of VM 104(1) has only received seveninterrupts at that point, and thus the guest OS believes the totalelapsed time is 70 ms per its system clock counter. Thus, in thisscenario RDTSC trap handler 110/emulator 112 can recognize that the VM'ssystem clock counter indicates a system time of 70 ms (in accordancewith the currently applied heuristics) and return a consistent RDTSCtimestamp of 70 ms to the calling guest code.

As noted in the Background section, a significant complication withperforming hypervisor-level trapping and emulation of the RDTSCinstruction is that, in some cases, this instruction may be called byguest code running within a secure hardware enclave of a VM. An exampleof such a secure hardware enclave is shown via reference numeral 114within VM 104(1) of FIG. 1 . In these cases, the state of the guest codewithin the secure hardware enclave will be opaque/inaccessible tohypervisor 102. As a result, it is not possible for RDTSC trap handler110 to provide an emulated RDTSC timestamp to the calling guest code,because handler 110 cannot write the timestamp to the VM's vCPUregister(s) in a manner that is visible within the enclave.

To address this and other similar problems, in various embodiments timevirtualization heuristics module 108 and/or RDTSC trap handler 110 canbe enhanced in a manner that guarantees RDTSC invocations made by guestcode within a VM's secure hardware enclave return RDTSC timestamps thatare always consistent with the VM's system clock counter. For example,in one set of embodiments (referred to herein as the TSC scalingapproach and detailed in section (3) below), components 108 and/or 110can be modified to leverage the TSC (Time-Stamp Counter) scaling featureavailable on many modern x86 CPU designs to modify, via physical CPUs106(1)-(K), the hardware-derived RDTSC timestamps that are returned tosecure hardware enclave guest code in accordance with module 108's timevirtualization heuristics (rather than performing this modification insoftware via RDTSC emulator 112). In certain embodiments, this TSCscaling approach can completely avoid the need for hypervisor 102 toexplicitly trap and emulate RDTSC instruction calls made from within aVM's secure hardware enclave (or more generally, from anywhere withinthe VM) in order to achieve consistency with the accelerated interruptdelivery performed by time virtualization heuristics module 108.

In another set of embodiments (referred to herein as the heuristicssuppression approach and detailed in section (4) below), components 108and/or 110 can be modified to (1) deactivate the time virtualizationheuristics applied to a given VM when an invocation of the RDTSCinstruction by guest code within a secure hardware enclave of that VM istrapped by hypervisor 102 (thereby effectively dropping the interruptsthat would have otherwise been delivered to the VM if the VM was notde-scheduled), (2) move forward the VM's system clock counter to matchreal-world time, and (3) disable any further RDTSC trapping for the VMuntil module 108 determines that time virtualization heuristics shouldbe reactivated. The combination of steps (1) and (2) ensures that, atthe time of the initial RDTSC exit/trap, the VM's system clock counterwill be brought in alignment with RDTSC timestamps generated directly bythe CPU hardware. Accordingly, hypervisor 102 can refrain from trappingand emulating any further RDTSC instruction calls made from within thesecure hardware enclave at that point, until an event occurs that causestime virtualization heuristics for the VM to be reactivated. With thisapproach, the VM's guest OS will observe a sudden jump forward in systemtime due to the system clock counter being moved forward; however, thisapproach is relatively easy to implement and thus can beopportunistically employed for scenarios where it is unlikely that theRDTSC instruction will be called by guest code within a secure hardwareenclave.

It should be appreciated that the architecture of host system 100 shownin FIG. 1 is illustrative and not intended to limit embodiments of thepresent disclosure. For instance, although FIG. 1 depicts a particulararrangement for the components of host system 100, other arrangementsare possible (e.g., the functionality attributed to a particularcomponent may be split into multiple components, components may becombined, etc.). Further, host system 100 may include othercomponents/sub-components and/or implement other functions that are notspecifically described. One of ordinary skill in the art will recognizeother variations, modifications, and alternatives.

3. TSC Scaling

As indicated above, TSC scaling is a feature found on modern x86 CPUdesigns which allows a hypervisor (or other program code) to set, in CPUhardware, a scaling factor to be applied to all RDTSC timestampsgenerated by that hardware for a given context (e.g., a given VM). Withthe TSC scaling approach, hypervisor 102 of FIG. 1 can leverage thisfeature to achieve consistency between the RDTSC timestamps generated byphysical CPUs 106(1)-(K) and the VM system clock interrupts/countersmanipulated by time virtualization heuristics module 108. This, in turn,obviates the need for hypervisor 102 to emulate, via RDTSC emulator 112,RDTSC instruction calls originating from secure hardware enclave guestcode within VMs 104(1)-(J).

FIG. 2 depicts a first workflow 200 for implementing the TSC scalingapproach according to certain embodiments. In this first implementation,TSC scaling is enabled as soon as time virtualization heuristics areactivated for a VM 104. Thus, there is no need to trap RDTSC instructioncalls originating from the VM at all, which improves overall VMperformance by eliminating the overhead associated with exits to thehypervisor.

Starting with block 202, time virtualization heuristics module 108 candetect the occurrence of an event/scenario that indicates timevirtualization heuristics (e.g., accelerated interrupt delivery) shouldbe activated with respect to a given VM 104 and can activate theheuristics accordingly. For example, module 108 may detect that VM 104has been de-scheduled and re-scheduled on a physical CPU 106, which hascaused the VM's guest OS to miss one or more system clock timerinterrupts.

At block 204, upon activating the time virtualization heuristics for VM104, module 108 can determine a scaling factor that should be applied toRDTSC timestamps returned to guest code within VM 104 (such as, e.g.,guest code running within a secure hardware enclave of the VM) based onthe activated heuristics. For instance, if the activated heuristicscause module 108 to accelerate the delivery of interrupts to the VM'sguest OS by a factor of 2, module 108 may determine that RDTSCtimestamps for the VM should be cut in half (subject to an appropriateoffset). Generally speaking, the goal of the scaling factor determinedat block 204 is to ensure that any hardware based RDTSC timestampsscaled using this scaling factor will reflect a VM system time that isconsistent with the VM's system clock counter, per the acceleratedinterrupt delivery applied via module 108. Although not explicitly shownin FIG. 2 , as part of block 204, time heuristics virtualization module108 may also determine an offset to be applied to the RDTSC timestampsper conventional virtualized RDTSC implementations.

Once timer virtualization heuristics module 108 has determined thescaling factor, module 108 can invoke an appropriate CPU instruction forprogramming this scaling factor into the physical CPU 106 that is mappedto the vCPU of VM 104, in accordance with the physical CPU's TSC scalingfeature (block 206). The result of this step is that all RDTSCtimestamps generated by that physical CPU from that point onward will bescaled per the scaling factor.

Then, at some future point in time, time virtualization heuristicsmodule 108 can determine that the system clock counter for VM 104 hascaught up with real-world time, and thus can deactivate the heuristicspreviously activated for VM 104 at block 202 (block 208). Finally, inresponse to deactivating the heuristics, time virtualization heuristicsmodule 108 can also disable the TSC scaling programmed/enabled at block206, which will cause physical CPU 106 to generate future RDTSCtimestamps for VM 104 in an unscaled fashion (block 210), andsubsequently return to block 202 in order to repeat the process asneeded. Note that throughout the entirety of workflow 200, timevirtualization heuristics module 108 does not set the trap bit fortrapping RDTSC instruction calls to hypervisor 102. Thus, anyinvocations of the RDTSC instruction made by guest code within VM 104will always be handled directly by the CPU hardware of the host system,without the involvement of hypervisor 102.

FIG. 3 depicts a second workflow 300 for implementing the TSC scalingapproach according to certain embodiments. In this secondimplementation, TSC scaling is enabled at the time an RDTSC exit/trapoccurs.

Starting with block 302, time virtualization heuristics module 108 candetect the occurrence of an event/scenario that indicates timevirtualization heuristics (e.g., accelerated interrupt delivery) shouldbe activated with respect to a given VM 104 and can activate theheuristics accordingly. In addition, at block 304, time virtualizationheuristics module 108 can enable a trap bit that causes all RDTSCinvocations from VM 104 (such as from, e.g., guest code running within asecure hardware enclave of the VM), to be trapped by hypervisor 102.

Then, at the time an RDTSC exit/trap occurs (in other words, at the timeguest code within VM 104 calls the RDTSC instruction and causes atrap/exit to hypervisor 102), RDTSC trap handler 110 can determine ascaling factor that should be applied to RDTSC timestamps returned tothe calling guest code based on the activated heuristics (block 306) andcan invoke an appropriate CPU instruction for programming this scalingfactor into the physical CPU 106 that is mapped to the vCPU of VM 104,in accordance with the physical CPU's TSC scaling feature (block 308).These two steps are substantially similar to blocks 204 and 206 ofworkflow 200.

At block 310, RDTSC trap handler 110 can determine at what point in thefuture the TSC scaling should be disabled. In certain embodiments, thiscan involve communicating with time virtualization heuristics module 108to determine when VM 104's system clock counter will be fully caught upwith real-world time (or stated another way, when the activatedheuristics for VM 104 can be deactivated). Upon determining this, RDTSCtrap handler 110 can set a timer (as, e.g., a background process/threadwithin hypervisor 102) for automatically disabling the TSC scaling atthat point in time (block 312).

Finally, at block 314, RDTSC trap handler 110 can disable the trap bitfor VM 104 previously set at block 304 and workflow 300 can return toblock 302 in order to repeat the process as needed.

It should be appreciated that workflows 200 and 300 of FIGS. 2 and 3 areillustrative and various modifications are possible. For example, insome embodiments, time virtualization heuristics module may dynamicallychange the rate at which it delivers interrupts to a given VM whileheuristics for the VM are active. This may be desirable if, for example,the VM is de-scheduled again during that time period, in which casemodule 108 may want to increase the rate at which interrupts aredelivered in order to prevent the VM's system clock counter from fallingfurther behind. In these embodiments, rather that determining andprogramming a single scaling factor for the VM, time heuristicsvirtualization module 108 may periodically adjust and re-program thisscaling factor to account for any changes in the interrupt delivery ratemade by the module.

4. Heuristics Suppression

FIG. 4 depicts a workflow 400 for implementing the heuristicssuppression approach noted in section (2) according to certainembodiments. With this approach, the heuristics activated for a VM bymodule 108 are suppressed at a time an RDTSC exit/trap is taken and theVM's system clock counter is set forward to match real-world time,thereby ensuring that hardware generated RDTSC timestamps are consistentwith the system clock counter. In addition, any system clock timerinterrupts that were not delivered to the VM due, e.g., the VM beingde-scheduled are effectively dropped (i.e., they are skipped over andnever delivered).

Starting with block 402, time virtualization heuristics module 108 candetect the occurrence of an event/scenario that indicates timevirtualization heuristics (e.g., accelerated interrupt delivery) shouldbe activated with respect to a given VM 104 and can activate theheuristics accordingly. In addition, at block 404, time virtualizationheuristics module 108 can enable a trap bit that causes all RDTSCinvocations from VM 104 (such as from, e.g., guest code running within asecure hardware enclave of the VM), to be trapped by hypervisor 102.

Then, at the time an RDTSC exit/trap occurs (in other words, at the timeguest code within VM 104 calls the RDTSC instruction and causes atrap/exit to hypervisor 102), RDTSC trap handler 110 can instruct timevirtualization heuristics module 108 to discard its internal state withrespect to VM 104, thereby deactivating/suppressing the heuristics forthe VM (block 406). This means that module 108 will no longer attempt todeliver to VM 104 the system clock timer interrupts that the VM hadmissed while in an inactive/de-scheduled state. RDTSC trap handler 110can further move forward VM 104's system clock counter to matchreal-world time (block 408). For example, if the VM's system clockcounter is currently set to 5 (i.e., 50 ms, assuming 10 ms perinterrupt) but the real-world elapsed time from the point of VM power-onis 90 ms, RDTSC trap handler 110 can move forward the system clockcounter to 9. In a particular embodiment, this can be achieved byissuing a remote procedure call (RPC) to an agent running within VM 104for a forward clock correction. Upon being invoked, the agent can adjustthe VM's system clock counter accordingly.

Finally, at block 410, RDTSC trap handler 110 can disable the trap bitfor VM 104 previously set at block 404 and workflow 400 can return toblock 402 in order to repeat the process as needed.

Certain embodiments described herein involve a hardware abstractionlayer on top of a host system (i.e., computer). The hardware abstractionlayer allows multiple containers to share the hardware resource. Thesecontainers, isolated from each other, have at least a user applicationrunning therein. The hardware abstraction layer thus provides benefitsof resource isolation and allocation among the containers. In theforegoing embodiments, virtual machines (VMs) are used as an example forthe containers and hypervisors as an example for the hardwareabstraction layer. As described above, each virtual machine includes aguest operating system in which at least one application runs. It shouldbe noted that these embodiments may also apply to other examples ofcontainers, such as containers not including a guest operating system,referred to herein as “OS-less containers” (see, e.g., www.docker.com).OS-less containers implement operating system—level virtualization,wherein an abstraction layer is provided on top of the kernel of anoperating system on a host computer. The abstraction layer supportsmultiple OS-less containers each including an application and itsdependencies. Each OS-less container runs as an isolated process inuserspace on the host operating system and shares the kernel with othercontainers. The OS-less container relies on the kernel's functionalityto make use of resource isolation (CPU, memory, block I/O, network,etc.) and separate namespaces and to completely isolate theapplication's view of the operating environments. By using OS-lesscontainers, resources can be isolated, services restricted, andprocesses provisioned to have a private view of the operating systemwith their own process ID space, file system structure, and networkinterfaces. Multiple containers can share the same kernel, but eachcontainer can be constrained to only use a defined amount of resourcessuch as CPU, memory and I/O.

Further embodiments described herein can employ variouscomputer-implemented operations involving data stored in computersystems. For example, these operations can require physical manipulationof physical quantities—usually, though not necessarily, these quantitiestake the form of electrical or magnetic signals, where they (orrepresentations of them) are capable of being stored, transferred,combined, compared, or otherwise manipulated. Such manipulations areoften referred to in terms such as producing, identifying, determining,comparing, etc. Any operations described herein that form part of one ormore embodiments can be useful machine operations.

Yet further, one or more embodiments can relate to a device or anapparatus for performing the foregoing operations. The apparatus can bespecially constructed for specific required purposes, or it can be ageneral-purpose computer system selectively activated or configured byprogram code stored in the computer system. In particular, variousgeneral-purpose machines may be used with computer programs written inaccordance with the teachings herein, or it may be more convenient toconstruct a more specialized apparatus to perform the requiredoperations. The various embodiments described herein can be practicedwith other computer system configurations including handheld devices,microprocessor systems, microprocessor-based or programmable consumerelectronics, minicomputers, mainframe computers, and the like.

Yet further, one or more embodiments can be implemented as one or morecomputer programs or as one or more computer program modules embodied inone or more non-transitory computer readable storage media. The termnon-transitory computer readable storage medium refers to any datastorage device that can store data which can thereafter be input to acomputer system. The non-transitory computer readable media may be basedon any existing or subsequently developed technology for embodyingcomputer programs in a manner that enables them to be read by a computersystem. Examples of non-transitory computer readable media include ahard drive, network attached storage (NAS), read-only memory,random-access memory, flash-based nonvolatile memory (e.g., a flashmemory card or a solid state disk), a CD (Compact Disc) (e.g., CD-ROM,CD-R, CD-RW, etc.), a DVD (Digital Versatile Disc), a magnetic tape, andother optical and non-optical data storage devices. The non-transitorycomputer readable media can also be distributed over a network coupledcomputer system so that the computer readable code is stored andexecuted in a distributed fashion.

In addition, while described virtualization methods have generallyassumed that virtual machines present interfaces consistent with aparticular hardware system, persons of ordinary skill in the art willrecognize that the methods described can be used in conjunction withvirtualizations that do not correspond directly to any particularhardware system. Virtualization systems in accordance with the variousembodiments, implemented as hosted embodiments, non-hosted embodimentsor as embodiments that tend to blur distinctions between the two, areall envisioned. Furthermore, certain virtualization operations can bewholly or partially implemented in hardware.

Many variations, modifications, additions, and improvements arepossible, regardless the degree of virtualization. The virtualizationsoftware can therefore include components of a host, console, or guestoperating system that performs virtualization functions. Pluralinstances can be provided for components, operations, or structuresdescribed herein as a single instance. Finally, boundaries betweenvarious components, operations, and data stores are somewhat arbitrary,and particular operations are illustrated in the context of specificillustrative configurations. Other allocations of functionality areenvisioned and may fall within the scope of the invention(s). Ingeneral, structures and functionality presented as separate componentsin exemplary configurations can be implemented as a combined structureor component. Similarly, structures and functionality presented as asingle component can be implemented as separate components.

As used in the description herein and throughout the claims that follow,“a,” “an,” and “the” includes plural references unless the contextclearly dictates otherwise. Also, as used in the description herein andthroughout the claims that follow, the meaning of “in” includes “in” and“on” unless the context clearly dictates otherwise.

The above description illustrates various embodiments along withexamples of how aspects of particular embodiments may be implemented.These examples and embodiments should not be deemed to be the onlyembodiments and are presented to illustrate the flexibility andadvantages of particular embodiments as defined by the following claims.Other arrangements, embodiments, implementations and equivalents can beemployed without departing from the scope hereof as defined by theclaims.

What is claimed is:
 1. A method comprising: activating, by a hypervisorof a host system, time virtualization heuristics for a virtual machine(VM), the time virtualization heuristics causing accelerated delivery ofsystem clock timer interrupts by the hypervisor to a guest operatingsystem (OS) of the VM; upon activating the time virtualizationheuristics, trapping, by the hypervisor, an invocation of a RDTSC (ReadTime-Stamp Counter) instruction made by guest program code runningwithin the VM; and in response to trapping of the RDTSC instruction:deactivating, by the hypervisor, the time virtualization heuristics forthe VM; and advancing, by the hypervisor, a system clock counter of theVM to be consistent with real-world time, wherein the system clockcounter indicates a system time of the VM and is based on the systemclock timer interrupts delivered by the hypervisor.
 2. The method ofclaim 1 further comprising, prior to the trapping: enabling a trap bitfor trapping, by the hypervisor, invocations of the RDTSC instruction.3. The method of claim 2 further comprising, subsequently to advancingthe system clock counter: disabling the trap bit.
 4. The method of claim1 wherein the guest program code comprises program code running within asecure hardware enclave of the VM.
 5. The method of claim 1 whereinadvancing the system clock counter causes future RDTSC timestampsgenerated by one or more physical central processing units (CPUs) of thehost system to be consistent with the system clock counter.
 6. Themethod of claim 1 wherein deactivating the time virtualizationheuristics disables the accelerated delivery of the system clock timerinterrupts.
 7. The method of claim 1 wherein advancing the system clockcounter comprises: issuing a remote procedure call (RPC) to an agentrunning within the VM requesting a forward clock correction of thesystem clock counter.
 8. A non-transitory computer readable storagemedium having stored thereon program code executable by a hypervisor ofa host system, the program code embodying a method comprising:activating time virtualization heuristics for a virtual machine (VM),the time virtualization heuristics causing accelerated delivery ofsystem clock timer interrupts by the hypervisor to a guest operatingsystem (OS) of the VM; upon activating the time virtualizationheuristics, trapping an invocation of a RDTSC (Read Time-Stamp Counter)instruction made by guest program code running within the VM; and inresponse to trapping of the RDTSC instruction: deactivating the timevirtualization heuristics for the VM; and advancing a system clockcounter of the VM to be consistent with real-world time, wherein thesystem clock counter indicates a system time of the VM and is based onthe system clock timer interrupts delivered by the hypervisor.
 9. Thenon-transitory computer readable storage medium of claim 8 wherein themethod further comprises, prior to the trapping: enabling a trap bit fortrapping, by the hypervisor, invocations of the RDTSC instruction. 10.The non-transitory computer readable storage medium of claim 9 whereinthe method further comprises, subsequently to advancing the system clockcounter: disabling the trap bit.
 11. The non-transitory computerreadable storage medium of claim 8 wherein the guest program codecomprises program code running within a secure hardware enclave of theVM.
 12. The non-transitory computer readable storage medium of claim 8wherein advancing the system clock counter causes future RDTSCtimestamps generated by one or more physical central processing units(CPUs) of the host system to be consistent with the system clockcounter.
 13. The non-transitory computer readable storage medium ofclaim 8 wherein deactivating the time virtualization heuristics disablesthe accelerated delivery of the system clock timer interrupts.
 14. Thenon-transitory computer readable storage medium of claim 8 whereinadvancing the system clock counter comprises: issuing a remote procedurecall (RPC) to an agent running within the VM requesting a forward clockcorrection of the system clock counter.
 15. A host system comprising:one or more physical central processing units (CPUs); a hypervisor; anda non-transitory computer readable medium having stored thereon programcode that, when executed, causes the hypervisor to: activate timevirtualization heuristics for a virtual machine (VM), the timevirtualization heuristics causing accelerated delivery of system clocktimer interrupts by the hypervisor to a guest operating system (OS) ofthe VM; upon activating the time virtualization heuristics, trap aninvocation of a RDTSC (Read Time-Stamp Counter) instruction made byguest program code running within the VM; and in response to trapping ofthe RDTSC instruction: deactivate the time virtualization heuristics forthe VM; and advance a system clock counter of the VM to be consistentwith real-world time, wherein the system clock counter indicates asystem time of the VM and is based on the system clock timer interruptsdelivered by the hypervisor.
 16. The host system of claim 15 wherein theprogram code further causes the hypervisor to, prior to the trapping:enable a trap bit for trapping invocations of the RDTSC instruction. 17.The host system of claim 16 wherein the program code further causes thehypervisor to, subsequently to advancing the system clock counter:disable the trap bit.
 18. The host system of claim 15 wherein the guestprogram code comprises program code running within a secure hardwareenclave of the VM.
 19. The host system of claim 15 wherein advancing thesystem clock counter causes future RDTSC timestamps generated by the oneor more CPUs to be consistent with the system clock counter.
 20. Thehost system of claim 15 wherein deactivating the time virtualizationheuristics disables the accelerated delivery of the system clock timerinterrupts.
 21. The host system of claim 15 wherein the program codethat causes the hypervisor to advance the system clock counter comprisesprogram code that causes the hypervisor to: issue a remote procedurecall (RPC) to an agent running within the VM requesting a forward clockcorrection of the system clock counter.